Combining Text and Heuristics for Cost-Sensitive Spam Filtering
نویسندگان
چکیده
Spam filtering is a text categorization task that shows especial features that make it interesting and difficult. First, the task has been performed traditionally using heuristics from the domain. Second, a cost model is required to avoid misclassification of legitimate messages. We present a comparative evaluation of several machine learning algorithms applied to spam filtering, considering the text of the messages and a set of heuristics for the task. Cost-oriented biasing and evaluation is performed.
منابع مشابه
Stacking Classifiers for Anti-Spam Filtering of E-Mail
We evaluate empirically a scheme for combining classifiers, known as stacked generalization, in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial email, or “spam”, floods mailboxes, causing frustration, wasting bandwidth, and exposing minors to unsuitable content. Using a public corpus, we show that stacking can improve the eff...
متن کاملA Memory-Based Approach to Anti-Spam Filtering
This paper presents an extensive empirical evaluation of memory-based learning in the context of anti-spam filtering, a novel cost-sensitive application of text categorization. Unsolicited commercial e-mail, also known as “spam”, floods the mailboxes of users, causing frustration, wasting bandwidth and money, and exposing minors to unsuitable content. Using a recently introduced publicly availa...
متن کاملActive Multi-Field Learning for Spam Filtering
Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a realworld spam filter, which suggests an active learning idea; and 2) Different messag...
متن کاملA general-purpose sentence-level nonsense detector
I have constructed a sentence-level nonsense detector, with the goal of discriminating well-formed English sentences from the large volume of fragments, headlines, incoherent drivel, and meaningless snippets present in internet text. For many NLP tasks, the availability of large volumes of internet text is enormously helpful in combating the sparsity problem inherent in modeling language. Howev...
متن کاملAn evaluation of Naive Bayesian anti-spam filtering
It has recently been argued that a Naive Bayesian classifier can be used to filter unsolicited bulk e-mail (“spam”). We conduct a thorough evaluation of this proposal on a corpus that we make publicly available, contributing towards standard benchmarks. At the same time we investigate the effect of attribute-set size, training-corpus size, lemmatization, and stop-lists on the filter’s performan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2000